version 18
clear
set more off
capture log close

* Use present working directory (data file must be in this directory)
pwd 

log using Analysis_of_DF_anly_1_LOG.log, text replace

* Marine Pay to Release Program RCT Analysis Code.do
* Created by Paul J Ferraro, 22 March 2024

*   This code was developed to analyze the data set "DF_anly_1.rds", created on
* 3/14/24 by Thomas Pienkowski and which Ferraro turned into an csv file for 
* analysis. The .rds data file was created by the R project "RCT_project"

* Data file DF_anly_1.csv and this Stata do file are available at 
* https://osf.io/b27ja/

import delimited DF_anly_1.csv

****************************************************************************
**************             DATA DICTIONARY                        **********
****************************************************************************

* boat = vessel identification number (1,2,...,85,86)
* phase_exp = Period number (P1,P2,P3,P4)
* village = village identification number
* site = regency (TL=East Lombok, AJ= Aceh Jaya)
* wf_count = retained wedgefish (for the period)
* hh_count = retained hammerheads (for the period)
* treatment = treatment condition (=1 if treated, i.e., offered a live release
*      payment; =0 if control, i.e., not offered a payment)
* day_count = number of days in the period for this vessel
* wf_count_norm = wf_count/day_count
* hh_count_norm = hh_count/day_count
* recipient = 1 if vessel received at least one live release payment during
*      the experiment, =0 otherwise

* Create numeric variables for the string variables.

egen villagen = group(village)

*   In the original data file, "village" was a string variable and identified
* the village name. In the final data file used in the analysis, villages were
* deidentified and assigned a numerical ID number. Rather than delete villagen 
* and edit all the subsequent code, we left the variable in the code. A reader
* can see that the values for village equal the values for villagen.

egen sitenn = group(site)

* 1=Aceh Jaya (AJ), 2=East Lombok (TL)

egen vesseln = group(boat)

*   The vesseln variable isn't necessary but is left over from when "boat" was
* a string variable that identified the vessel name. But in the final data file,
* "boat" has been deidentified and assigned a numerical ID number. Rather than 
* delete this variable and edit all subsequent code, we left it in the code.

egen phasen = group(phase_exp)

*   villagen 1,4 and 5 had 4-period designs and villagen 2 and 3 had 3-period
* designs. See Material and Methods (M&M) section of final publication.

*   Next, tell Stata that these data are panel data at the vessel level and 
* that phasen values are the time periods in order.

xtset vesseln phasen

*   Sort data by vessel and then, by time period within vessel, because the 
* data are easier to observe in this format and one can easily calculate 
* serial autocorrelations, if desired.

sort vesseln phasen

*   Let's create some more intuitive labels for the variables

label variable siten "Regency"
label variable phasen "Period"
label variable villagen "Village"
label variable treatment "Offer Payment"
label variable hh_count "Hammerhead Retained Catch"
label variable wf_count "Wedgefish Retained Catch"
label variable hh_count_norm "Hammerhead Retained Catch/Day"
label variable wf_count_norm "Wedgefish Retained Catch/Day"
label variable day_count "Period Length in Days"
label variable vesseln "Vessel ID"

*   Summary statistics from M&M and Supplemental Materials (SM) come after the
* estimation of average effects, the robustness checks, and the exploratory
* analyses.


****************************************************************************
*****           ESTIMATION OF AVERAGE TREATMENT EFFECTS                *****
****************************************************************************

******             ESTIMAND (TARGET CAUSAL PARAMETER)               ********

*   See M&M "Estimation Procedures" for description of the estimand. 
* Here we summarize key points.

*  A vessel is exposed to the *treatment* when the conservation program field 
* team offers vessels Regency-specific, fixed amounts of money for every live 
* hammerhead shark or wedgefish released from from nets or lines and documented 
* with a project-provided video camera (the amounts of money are described in 
* the publication's main text).

*  The *outcome* is the count of retained catch; i.e., dead hammerheads 
* (hh_count) and dead wedgefish (wf_count).

*	We want to estimate an Average Treatment Effect (ATE), in percentage terms,
* defined as an expected effect on retained catch from picking a vessel at
* random and offering it pay-to-release payments. If R is retained fish, then 
* the potential retained fish catch under the treated condition is R(1) and the 
* potential retained fish catch under control condition is R(0). Thus, 
* ATE% = (E[R(1)-R(0)])/E[R(0)], where E[*] is the expectation operator.

*  One can also view this estimand as the average percent change in retained
* catch as a result of offering live release payments, or equivalently the 
* average percent change in retained catch when all vessels are switched from
* the control condition to treated condition.

 
******                   ESTIMATOR OF ATE%                         **********
 
*   See M&M "Estimation Procedures" for description of the estimator. Here we 
* summarize key points associated with the coding syntax used in the analysis.
 
*   To estimate ATE%, we use the following syntax:

* xtpoisson hh_count treatment i.phasen#i.villagen, 
*            irr vce(cluster vesseln) exposure(day_count)

*   -xtpoisson- calls the Poisson estimator for panel data ("panel data" = 
* repeated observations on same units over time). The default with this command
* is for vessel-level random effects to follow a Gamma distribution.

*   The "#" symbol creates a period-village interaction term between each
* period dummy variable (phasen) and each village dummy variable (villagen).
*   Recall that two villages were added after the first three villages started
* and thus these two villages have no Period 4. Stata drops those missing
* interaction terms.

*   The ATE% is also known as the Incident Rate Ratio (IRR), and thus -irr- is
* included as an option. The IRR is the exponentiated coefficient of the 
* variable minus 1; e.g., if the coefficient of treatment variable in the
* regression model is 0.381 the ATE% = IRR = exp(0.381)-1= 0.464; 
* i.e., a 46.4% increase.

*   -vce()- calls cluster-robust SE estimator, where the cluster is the vessel
* and the Huber/White/sandwich estimator is calculated for the coefficients 
* estimated in this regression (see Wooldridge 2020 for details). VCE stands for
* variance–covariance matrix of the estimators.

*   -exposure(day_count)- adjusts the estimator for differences in the length of 
* time a vessel was exposed to the condition in that period.

*   We can also formulate the Poisson estimator as a multi-level, mixed-effects 
* model, and we do so later in the code.

*   At the end of this code under "Other Analyses", we argue why the assumption
* of random effects following a Gamma distribution is preferred to assuming that 
* they follow a normal distribution (reason: better correspondence between 
* observed and predicted values). Regardless of the assumed distribution, the 
* estimated ATE% and the conclusions about the presence of countervailing 
* pathways are the same, as shown below in "Robustness Checks".
  
*   If treatment (T) is randomized, then ATE% = E[(R(1)-R(0)])/R(0)] = 
* (E[R(1)] - E[R(0)])/E[R(0)] = (E[R|T=1]- E[R|T=0])/E[R|T=0] = 
* exp(b) - 1, where exp is the exponential function, | means "conditional on",
* and b is the coefficient of the treatment variable in the Poisson regresson 
* model. In other words, we can use observable quantities to estimate 
* counterfactual quantities (E[R(0)|T=0] = E[R(0)|T=1] = E[R(0)] and
* (E[R(1)|T=1] = E[R(1)|T=0] = E[R(1)]).


******    CONVENTIONAL MONITORING AND EVALUATION (M&E) APPROACH     **********

*   In the analysis, we'll compare the ATE% from the experimental estimator
* with the ATE% from the conventional M&E approach that uses equation (1) in 
* the M&M section.

*      (Total Live Releases)/(Total Retained Catches + Total Live Releases)

* 	WEDGEFISH
*    Total Live Releases of Wedgefish During Treated Periods = 475
*    Total WF Retained During Treated Periods = 198
*    475/(198+475) = 70.58% reduction in wedgefish deaths.

* 	HAMMERHEADS
*    Total Live Releases of Hammerheads During Treated Periods = 364
*    Total HH Retained During Treated Periods = 7830
*    364/(7830+364) = 4.44% reduction in hammerhead deaths.

* These values are presented in Figure 2 panel A and panel B in the publication.
*   They can be obtained from running HH_Retained and Released Final.do and 
* WF_Retained and Released Final.do, located at https://osf.io/b27ja/


*******************************************************************************
******          EXPERIMENTAL ESTIMATE OF HAMMERHEAD ATE%               ********
*******************************************************************************

  ////////////////////////////////////////////////////////////////////////////
  //	 Estimated Effect of Live Release Program on Retained Hammerheads ////
  ////////////////////////////////////////////////////////////////////////////

*   Retained catch per period is the variable hh_count

xtpoisson hh_count treatment i.phasen#i.villagen, irr vce(cluster vesseln) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in Figure 2 and main text. 

*   NOTE 1: If the iteration log from the MLE is desired, remove the -nolog- 
* syntax as an option after the comma.

*   To assess the evidence of countervailing pathways, we compare the 
* estimated experimental effect size to the live release benchmark value from
* the conventional M&E approach.

*   We can see from the regression output that the 95% CI for the treatment
* variable does not include the ATE%(Conventional Approach), but we can also
* conduct a formal test of the null hypothesis.

*   Test ATE%(Experimental Estimator) = ATE%(Conventional Approach) = -4.44%, 
* which is equivalent to an IRR = 0.9556 or an estimated coefficient of 
* ln(0.9556)=-0.0454

test treatment== -0.0454

*   This is a Chi2 test of whether the estimated coefficient of the treatment
* variable (not the IRR) is equal to -0.045. We reject the null and report the 
* rejection in the main text.

*******************************************************************************
******          EXPERIMENTAL ESTIMATE OF WEDGEFISH ATE%                ********
*******************************************************************************

  ////////////////////////////////////////////////////////////////////////////
  //	 Estimated Effect of Live Release Program on Retained Wedgefish   ////
  ////////////////////////////////////////////////////////////////////////////

*   Retained catch per period is the variable wf_count.

xtpoisson wf_count treatment i.phasen#i.villagen, irr vce(cluster vesseln) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in Figure 2 and main text. 

*   To assess the evidence of countervailing pathways, we compare the 
* estimated experimental effect size to the live release benchmark value from
* the conventional M&E approach.

*   We can see from the regression output that the 95% CI for the treatment
* variable does not include the ATE%(Conventional Approach), but we can also
* conduct a formal test of the null hypothesis.

*   Test ATE%(Experimental Estimator) = ATE%(Conventional Approach) = -70.58%, 
* which is equivalent to an IRR=0.2942 or an estimated coefficient of 
* ln(0.2942)=-1.2235

test treatment== -1.2235

*   This is a Chi2 test of whether the estimated coefficient of the treatment
* variable (not the IRR) is equal to -1.22. We reject the null and report the 
* rejection in the main text.


*******************************************************************************
******             CONCLUSION ABOUT ATE% FOR THE TWO SPECIES           ********
*******************************************************************************

*   We can reject ATE%(hh_count) = -4% and ATE%(wf_count) = -71%, and thus we
* have evidence of countervailing mechanisms for both taxa.

* NOTE 2: In SM, we do a quick check to see if our ATE% estimates above are 
* reasonable. We calculate the implied percent change using the raw retained
* count numbers in treated and control conditions divided by the total number
* of days in each of these conditions. In other words, calculate the percent
* change in retained catch per day when moving from control to treated periods
* (ignoring the temporal variation in period starts and stop dates and the 
* unbalanced aspect of the panel).

*   The total number of retained hammerheads during treated periods is 7830 and
* and the total number of days is 14269 (source: Summary Statistics below). 
*   The total number of retained hammerheads during control periods is 5717 and
* and the total number of days is 13723 (source: Summary Statistics below).
*   Thus the implied percent change in retained catch/day is a 32% increase
* i.e., (7830/14269)/(5717/13723) = 1.32

*   The total number of retained wedgefish during treated periods is 198 and
* and the total number of days is 14269 (source: Summary Statistics below). 
*   The total number of retained wedgefish during control periods is 231 and
* and the total number of days is 13723 (source: Summary Statistics below).
*   Thus the implied percent change in retained catch/day is a 18% decrease
* i.e., (198/14269)/(231/13723) = 0.82

*   Better, we could normalize the counts by day_count and look at the change in 
* retained catch per day in treated and control conditions. Reported in SM
*   For hammerheads, 0.5526/0.4110 = 35% increase.
*   For wedgefish, 1 -(0.013213/0.0154342) = 14% decrease 
* (source: Summary Statistics below). 

*   NOTE 3: A reader familiar with non-experimental designs may wonder why we do
* not use a fixed-effects estimator; i.e., fixed effects in the economics 
* literature sense, where the vessel-level effects are parameters to to be
* estimated (or removed by taking deviations from means or first-differencing)
* and the estimator only uses the within-vessel variation in the treatment.
*   Because the treatment was randomized and thus both estimators are unbiased
* (we know that there is no correlation between the error term and the 
* treatment variable). But the fixed-effects estimator is less efficient
* (estimates are less precise) because it ignores the between-vessel variation
* (which also leads itto drop the few vessels that do not have time-varying 
* treatment conditions). Both estimators, however, yields similar inferences
* about the ATE% and the presence of countervailing pathways. One can run the
* code below and confirm these claims.

* ppmlhdfe hh_count_norm treatment i.phasen#i.villagen, irr a(vessel) 
*                        cluster(vessel) exposure(day_count)

* ppmlhdfe wf_count_norm treatment i.phasen#i.villagen, irr a(vessel) 
*                        cluster(vessel) exposure(day_count)



****************************************************************************
*******             MISSINGNESS OF RETAINED CATCH                  *********
****************************************************************************

*   Statistics about missingness (attrition) of retained catch were reported in
* the M&M and SM sections of the publication. Recall the a period is considered
* missing if the vessel never came to the landing site during the period (i.e., 
* every daily value in the field team's database for that period is "NA").

*   Describe patterns in the data, allow up to 15 patterns to be listed 

xtdescribe, patterns(15)

*   Observation in M&M: 79 out of 86 vessels are observed in both the treated 
* and control conditions. 7 vessels are only observed in one or the other
* condition.

* NOTE 4: As noted in the M&M, in processing the data, one boat was dropped 
* because it had NA reported for retained catch landings in all periods. Thus 
* N=86 for the analysis (whereas N=87 vessels were in original experiment).

* Patterns by villages, which was used for Figure SM3.

* Let's look at the three villages that could have up to 4 periods of landings

xtdescribe if (villagen==1 | villagen==4 | villagen==5), patterns(15)

* Let's look at the two villages that could have up to 3 periods of landings

xtdescribe if (villagen==2 | villagen==3), patterns(15)

*   Observations in SM:
*   29 vessels in villages 1, 4, and 5 and 23 vessels in villages 2 and 3
* have zero periods in which there were no landings recorded by the field team.
* 34 vessels are missing one or more periods (39%). 16 of them are missing one
* period. So 4 out of 5 vessels have all of their periods or all but one.
* 13 of the 39 vessels with missing periods are missing two periods.

* Another perspective based on summary stats from above.
*      If we had landings for every boat across 4 periods, we
*   would have 86*4 = 344 observations. So 83 observations are missing. Of  
*   those, 26 are missing simply because villages 2 and 3 had 3-period designs 
*   (see M&M: 26 vessels were added late, and thus they don't have a Period 4). 
*   23 vessels from other three villages are missing Period 4 observations.
*     The remaining missing observations are distributed roughly equally across 
*   the other three periods.

*   Is period missingness independent of treatment status? If yes, we would
* expect to have the same number of period observations in treated and control
* conditions.

tabulate treatment

* Observation in M&M
*      Missingess is no more prevalent in treated condition than control 
*    condition. 

* If one wants a formal statistical test.....

prtest treatment == 0.5

*   Another check: Confirm that the period lengths are the same, on average, 
* in treated and control conditions.

* Describe period length in days (exposure) for treatment condition

xtsum day_count if treatment==1

* Describe period length in days (exposure) for control condition

xtsum day_count if treatment==0

* Is there a correlation between period length and treatment condition?

xtpoisson day_count treatment, irr vce(cluster vesseln) nolog

* No, there is no correlation between period length and treatment condition

*   Look at missingness patterns for recipients (i.e., vessels that did live 
* releases.

xtdescribe if recipient==1 & (villagen==2 | village==3), patterns(15)
xtdescribe if recipient==1 & (villagen==4 | villagen==1), patterns(15)

*   Observation in SM: For the payment recipients, 27 out of 29 (93%) have no 
* missing periods.


///////////////////////////////////////////////////////////////////////////////

*******************************************************************************
******                         ROBUSTNESS CHECKS                       ********
*******************************************************************************

* The study does 6 robustness checks.

* 1. Robustness to Carryover Effects 
* 2. Payment Recipient-only Subgroup Analysis
* 3. Alternative Estimator 1: Multi-level, mixed-effects model with random
*      effects that follow a Gaussian (normal) distribution
* 4. Alternative Estimator 2: A random-effects GLS panel data estimator
* 5. Original Poisson estimator with bootstrapped SEs
* 6. Original Poisson estimator with top 1% of retained catch values removes 
*     (i.e., remove potential "outliers").


*******************************************************************************
******              1. ROBUSTNESS TO CARRYOVER EFFECTS                 ********
*******************************************************************************

*   Carryover effects are behavioral spillovers across periods whereby the 
* condition in prior periods affects behavior in the current period (i.e., a
* form of intertemporal interference in experimental cross-over designs).
*   To assess the potential impact of carryover effects on our conclusions,
* we repeat the original estimation of ATE%, but use only Period 1 data,
* recognizing that statistical power will be much lower without leveraging the
* within-vessel variation in treatment status.

*   First, we need to calculate ATE% using the conventional M&E approach and
* the live releases from only Period 1:
*        (Total Live Releases)/(Total Retained Catches + Total Live Releases)

* 	WEDGEFISH
*    Total Live Releases of Wedgefish = 158
*    Total WF Retained = 79
*    158/(79+159) = 66.39% reduction in wedgefish deaths.

* 	HAMMERHEADS
*    Total Live Releases of Hammerheads = 3
*    Total HH Retained = 2844
*    3/(2844+3) = 0.11% reduction in hammerhead deaths.



******                HAMMERHEAD: PERIOD 1 ONLY                      ********

* Estimate ATE% using Period 1 data only (Poisson model with robust SEs)

poisson hh_count treatment i.phasen#i.villagen if phasen==1, irr vce(robust) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

*   To assess the evidence of countervailing pathways, we compare the 
* estimated experimental effect size to the live release benchmark value from
* the conventional M&E approach.

*   We can see from the regression output that the 95% CI for the treatment
* variable does not include the ATE%(Conventional Approach), but we can
* conduct a formal test of the null hypothesis.

*   Test ATE%(Experimental Estimator) = ATE%(Conventional Approach) = -0.11%, 
* which is equivalent to an IRR = 0.9989 or an estimated coefficient of 
* ln(0.9989)=-0.0011

test treatment== -0.0011


******                WEDGEFISH: PERIOD 1 ONLY                      ********

* Estimate ATE% using Period 1 data only (Poisson model with robust SEs)

poisson wf_count treatment i.phasen#i.villagen if phasen==1, irr vce(robust) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

*   To assess the evidence of countervailing pathways, we compare the 
* estimated experimental effect size to the live release benchmark value from
* the conventional M&E approach.

*   We can see from the regression output that the 95% CI for the treatment
* variable does not include the ATE%(Conventional Approach), but we can
* conduct a formal test of the null hypothesis.

*   Test ATE%(Experimental Estimator) = ATE%(Conventional Approach) = -66.39%, 
* which is equivalent to an IRR = 0.3361 or an estimated coefficient of 
* ln(0.3361)=-1.1090

test treatment==-1.1090


******             CONCLUSION ABOUT ATE%: CARRYOVER EFFECTS            ********

*   We conclude that carryover effects in the full sample are not a rival
* explanation for our conclusions. Such effects may exist, but they are not 
* creating a false or spurious countervailing pathway effect.


*******************************************************************************
******          2. PAYMENT RECIPIENT-ONLY SUBGROUP ANALYSIS            ********
*******************************************************************************

*   We would expect the patterns we observe in the full data set to be stronger
* among the subgroup of vessels that engaged in live releases (and thus 
* received payments).

* Count how many vessels did at least one live released (received >=1 payment).

unique vesseln if recipient==1

* Count how many observations are from vessels that received >=1 payment 

count if recipient==1

* Observations Reported in SM
* 		Recipients account for 29/86 = 34% of vessels
*       Recipients account for 92/261 observations = 35%
* 		Recipients accounted for 3/4 of the retained HH and 2/3 of retained WF
*     during the control periods. From data in files
*     WF_Retained_and_Released_Totals.dta and HH_[.].

*   First, we need to calculate ATE% using the conventional M&E approach and
* the live releases from only payment recipients:

*        (Total Live Releases)/(Total Retained Catches + Total Live Releases)

* 	WEDGEFISH
*    Total Live Releases of Wedgefish = 475
*    Total WF Retained = 143
*    475/(143+475) = 76.86% reduction in wedgefish deaths.

* 	HAMMERHEADS
*    Total Live Releases of Hammerheads = 364
*    Total HH Retained = 6302
*    364/(6302+364) = 5.46% reduction in hammerhead deaths.


******                HAMMERHEADS: RECIPIENTS ONLY                    ********

* Estimate the ATE% for the hammerhead retained catch among recipients

xtpoisson hh_count treatment i.phasen#i.villagen if (recipient==1), irr vce(cluster vesseln)  exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

*   To assess the evidence of countervailing pathways, we compare the 
* estimated experimental effect size to the live release benchmark value from
* the conventional M&E approach.

*   We can see from the regression output that the 95% CI for the treatment
* variable does not include the ATE%(Conventional Approach), but we can
* conduct a formal test of the null hypothesis.

*   Test ATE%(Experimental Estimator) = ATE%(Conventional Approach) = -5.46%, 
* which is equivalent to an IRR = 0.9454 or an estimated coefficient of 
* ln(0.9454)=-0.0562

test treatment== -0.0562


******                WEDGEFISH: RECIPIENTS ONLY                    ********

* Estimate the ATE% for the wedgefish retained catch among recipients

xtpoisson wf_count treatment i.phasen#i.villagen if (recipient==1), irr vce(cluster vesseln)  exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

*   To assess the evidence of countervailing pathways, we compare the 
* estimated experimental effect size to the live release benchmark value from
* the conventional M&E approach.

*   We can see from the regression output that the 95% CI for the treatment
* variable does not include the ATE%(Conventional Approach), but we can
* conduct a formal test of the null hypothesis.

*   Test ATE%(Experimental Estimator) = ATE%(Conventional Approach) =  76.86%, 
* which is equivalent to an IRR = 0.2314 or an estimated coefficient of 
* ln(0.2314)=-1.4636

test treatment== -1.4636


******          CONCLUSION ABOUT ATE%: RECIPIENT-ONLY SUBGROUP         ********

*   In comparison to the estimated ATE%s using the full sample, the estimated 
* ATE%s using the subgroup move towards positive values for both taxa, as we
* would expect if the payments are inducing countervailing mechanisms.

*   NOTE 5: Some readers may want to know why don't we use an instrumental 
* variable design where randomization of treatment is the instrument for
* "releasing a fish and taking a payment." We do not need to do so because all
* payment recipients were observed in both treated and control conditions. So,
* if we think of them as a type of "complier" (offered payment and took at least
* one payment), we don't have to estimate the complier average causal effect 
* using randomization as an instrumental variable because we can observe and 
* identify individual compliers in both treated and control states. That's not
* possible in a typical RCT, but it is possible in a cross-over design.


*******************************************************************************
******  3. ALTERNATIVE ESTIMATOR 1: MULTI-LEVEL, MIXED-EFFECTS MODEL   ********
*******************************************************************************

* 	We estimate the ATE% using a multi-level, mixed-effects generalized 
* linear model that assumes the vessel-level random effects follow a Gaussian 
* (normal) distribution rather than a Gamma distribution. 
*   This formulation may be more familiar to ecologists.

*   Stata syntax:
* meglm hh_count_norm i.treatment i.phasen#i.villagen, irr intpoints(12)
*      family(poisson) vce(robust) exposure(day_count)  || vesseln:
*   Or one can use
* mepoisson hh_count_norm i.treatment i.phasen#i.villagen, irr intpoints(12)
*		 vce(robust) exposure(day_count) || vesseln: 

*   In Stata, the default link function for the Poisson family is log.
* Like previous estimators, we estimate cluster-robust SEs.

*   -meglm- and -mepoisson- assume that the vessel-level random effects follow
* a Gaussian (normal) distribution rather than Gamma distribution. Thus the
* estimated coefficients are similar to what would be obtained from using
* -xtpoisson- with the -normal- option:
* xtpoisson hh_count_norm i.treatment i.phasen#i.villagen, 
* 			normal vce(cluster vesseln) exposure(day_count)
*   Using that syntax, we get the same coefficients and SEs to the 4th decimal 
* place as one obtains for -meglm- and -mepoisson-.

*   To make this GLM estimator more comparable to the -xtpoisson- command, we 
* specify the number of integration points used by the mvaghermite integration 
* method. -xtpoisson- uses 12 points as the default. So we use the syntax 
* -intpoints(12)-


******             HAMMERHEADS: MULTI-LEVEL, MIXED-EFFECTS MODEL        ********

meglm hh_count treatment i.phasen#i.villagen, irr intpoints(12) family(poisson) vce(robust) exposure(day_count) nolog || vesseln:

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #.

******             WEDGEFISH: MULTI-LEVEL, MIXED-EFFECTS MODEL        ********

meglm wf_count i.treatment i.phasen#i.villagen, irr family(poisson) vce(robust) exposure(day_count) nolog || vesseln:

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #.


******              CONCLUSION: ALTERNATIVE ESTIMATOR 1               ********

*   No difference in the conclusions using a multi-level, mixed-effect model
* specification



*******************************************************************************
**** 4. ALTERNATIVE ESTIMATOR 2: RANDOM-EFFECTS, GLS PANEL DATA ESTIMATOR  ****
*******************************************************************************

*    We estimate the ATE% using a random-effects, Generalized Least Squares
* (GLS) panel data estimator, which is unbiased given the randomization of the 
* treatment (i.e., strict exogeneity is satisfied), but less efficient than the 
* Poisson random-effects panel data estimator we use above (and GLS estimator
* won't predict retained catch well, which would affect any calculation of the 
* ATE in levels via an Average Marginal Effect calculation).

*   Satisfying strict exogeneity means that the expected value of the error term 
* conditional on treatment is 0 (E[e | T) = 0). The GLS estimator is 
* therefore unbiased - no other assumptions are required about the 
* underlying probability distribution of the data, but OLS estimator may not 
* be the most efficient (i.e., have lowest variance).


******                 HAMMERHEADS: GLS ESTIMATOR                    ********

* We can implement this estimator in two ways. 

*   First way: Ignore the small variations in day_count within village and
* just estimate the effect on hh_count (in the Poisson estimators above, adding
* exposure(day_count) has a small effect, at third decimal place, on estimates 
* because the period-village interactions already adjust for variation in 
* day_count across vessels because for most vessels in a village, they all
* have the same day_count for a given period).

xtreg hh_count i.treatment i.phasen#i.villagen, re cluster(vesseln)

*   The estimated average treatment effect and CI for the treatment variable in 
* the regression above are reported in SM Table #.

*   We used the factor notation "i.treatment" so we can use the margins command
* to determine the implied percent change in the retained catch.

margins treatment

* An estimated increase of 25% (50/40-1). Reported in the SM text.

*   The conventional M&E approach implies a reduction of 4.44%, or a reduction
* in the control value by 1.78 hammerheads (from 40.41 to 38.63). We can test
* whether the estimated effect in GLS regression above is different from that
* value.

test 1.treatment==-1.77802372

* We can reject at p<0.05. Reported in the SM text.

*   Second way: Standardize each hh_count value by the day_count for the 
* period, yielding a measure of retained fish per day per vessel during the 
* period (called hh_count_norm).

xtreg hh_count_norm i.treatment i.phasen#i.villagen, re cluster(vesseln)

*   Reported in the SM text that the implied percentage change is no different.

margins treatment

* An estimated increase of 26% (0.48/0.38-1). Reported in the SM text.


******                 WEDGEFISH: GLS ESTIMATOR                    ********

* We can implement this estimator in the same two ways as we did for hammerheads

*   First way: Ignore the small variations in day_count within village

xtreg wf_count i.treatment i.phasen#i.villagen  , re cluster(vesseln)

*   The estimated average treatment effect and CI for the treatment variable in 
* the regression above are reported in SM Table #.

margins treatment

* An estimated decrease of 17% ((1.47/1.78)-1). Reported in the SM text.

*   The conventional M&E approach implies a reduction of 70.58%, or a reduction
* in the control value by 1.26 wedgefish (from 1.78 to 0.52). We can test
* whether the estimated effect in GLS regression above is different from that
* value.

test 1.treatment==-1.26

* We can reject at p<0.01. Reported in the SM text.

*   Second way: Standardize each hh_count value by the day_count

xtreg wf_count_norm i.treatment i.phasen#i.villagen, re cluster(vesseln)

*   Reported in the SM text that the implied percentage change is no different.

margins treatment

* An estimated decrease of 12% ((0.013/0.015)-1). Reported in the SM text.


******              CONCLUSION: ALTERNATIVE ESTIMATOR 2               ********

*   No difference in the conclusions using a random-effects, GLS estimator.



*******************************************************************************
**** 5. ALTERNATIVE ESTIMATOR 3: POISSON ESTIMATOR WITH BOOTSTRAPPED SEs   ****
*******************************************************************************

*   Use the same estimator as in the original analysis, but rather than estimate
* the SEs via the cluster-robust VCE estimator, estimate them via cluster
* bootstrapping, which puts even less stucture on the variance-covariance
* matrix values. Use 1,000 replications (using more has little effect, changes
* at third or fourth decimal place for SE estimate; to speed up calculations
* users can change to reps(500) with little effect).

*   To ensure a user of this code can replicate the boostrapping procedure, we 
* set a seed

set seed 06041968

******                 HAMMERHEADS: BOOSTRAPPED SEs                   ********

xtpoisson hh_count treatment i.phasen#i.villagen, irr vce(bootstrap, reps(1000)) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

test treatment== -0.0454

* Reported in the SM text.

******                 WEDGEFISH: BOOSTRAPPED SEs                   ********

xtpoisson wf_count treatment i.phasen#i.villagen, irr vce(bootstrap, reps(1000)) exposure(day_count) nolog

test treatment== -1.2235



******              CONCLUSION: ALTERNATIVE ESTIMATOR 3               ********

*   No difference in the conclusions using bootstrap estimation for SEs.


******************************************************************************
******               6. ROBUSTNESS TO OUTLIERS     				      ********
******************************************************************************

*   The sample size is large enough that outliers should not have a strong 
* influence on the results, but to do a simple check, we remove the top 1% of
* observations in the upper tail of the retained catch distribution for each
* species.

******                 HAMMERHEADS: OUTLIERS                         ********

summarize hh_count, detail

* 99th percentile is 568
* Drop all observations at or above this value and re-estimate

xtpoisson hh_count treatment i.phasen#i.villagen if hh_count<568, irr vce(cluster vesseln) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

******                 WEDGEFISH: OUTLIERS                           ********

summarize wf_count, detail

* 99th percentile is 23
* Drop all observations at or above this value and re-estimate

xtpoisson wf_count treatment i.phasen#i.villagen if wf_count<23, irr vce(cluster vesseln) exposure(day_count) nolog

*   The estimated IRR and CI for the treatment variable in the regression above
* are reported in SM Table #. 

******              CONCLUSION: ROBUSTNESS TO OUTLIERS              ********

*   No change in conclusions. The estimates move in positive direction
* (strengthening the conclusions).

****************************************************************************
****          TABLES FROM ANALYSES ABOVE THAT ARE IN THE SM            *****
****************************************************************************

* Specify the independent variables that all the estimators include

global xlist treatment i.phasen#i.villagen

* The six Poisson estimators for retained hammerhead catch

* The original (Fig 2) Poisson estimator

eststo: quietly xtpoisson hh_count $xlist, irr vce(cluster vesseln) exposure(day_count)

*   Create a variable for the conventional M&E estimate of impacts to which we
* compare the estimate from the experiment. Number computed above in comments.

estadd scalar Conventional= 0.956

* The original estimator using only Period 1 day 

eststo: quietly poisson hh_count $xlist if phasen==1, irr vce(robust) exposure(day_count)
estadd scalar Conventional= 0.999

* The original estimator using only Recipient data

eststo: quietly xtpoisson hh_count $xlist if (recipient==1), irr vce(cluster vesseln)  exposure(day_count) 
estadd scalar Conventional= 0.945

* Multi-level, mixed-effects estimator

eststo: quietly meglm hh_count $xlist, irr intpoints(12) family(poisson) vce(robust) exposure(day_count) || vesseln:
estadd scalar Conventional= 0.956

* The original estimator using bootstrapped SEs 

eststo: quietly xtpoisson hh_count treatment i.phasen#i.villagen, irr vce(bootstrap, reps(1000)) exposure(day_count)
estadd scalar Conventional= 0.956

* The original estimator removing top 1% of outcome variable outcomes

eststo: quietly xtpoisson hh_count treatment i.phasen#i.villagen if hh_count<568, irr vce(cluster vesseln) exposure(day_count) nolog
estadd scalar Conventional= 0.956

esttab using HammerheadsPoisson.txt, replace eqlabels(" " " " " " " " " " " ") eform ci keep(treatment) stats(Conventional N) nostar label title("Estimated Treatment Effects for Hammerhead Sharks, Supplemental Table 1")  mtitles("Original" "Period 1" "Recipients"  "Multi-level" "Bootstrap" "Outliers")  nonotes

* NOTE 6: To get rid of column numbers, add -nonumbers- as option

eststo clear


* The six Poisson estimators for retained wedgefish catch

eststo: quietly xtpoisson wf_count $xlist, irr vce(cluster vesseln) exposure(day_count)
estadd scalar Conventional= 0.294
eststo: quietly poisson wf_count $xlist if phasen==1, irr vce(robust) exposure(day_count)
estadd scalar Conventional= 0.332
eststo: quietly xtpoisson wf_count $xlist if (recipient==1), irr vce(cluster vesseln)  exposure(day_count) 
estadd scalar Conventional= 0.231
eststo: quietly meglm wf_count $xlist, irr intpoints(12) family(poisson) vce(robust) exposure(day_count) || vesseln:
estadd scalar Conventional= 0.294
eststo: quietly xtpoisson wf_count treatment i.phasen#i.villagen, irr vce(bootstrap, reps(1000)) exposure(day_count)
estadd scalar Conventional= 0.294
eststo: quietly xtpoisson wf_count treatment i.phasen#i.villagen if wf_count<23, irr vce(cluster vesseln) exposure(day_count)
estadd scalar Conventional= 0.294

esttab using WedgefishPoisson.txt, replace eqlabels(" " " " " " " " " " " ") eform ci keep(treatment) stats(Conventional N) nostar label title("Estimated Treatment Effects for Wedgefish, Supplemental Table 2")  mtitles("Original" "Period 1" "Recipients"  "Multi-level" "Bootstrap" "Outliers") nonotes

eststo clear

*   The GLS estimators for hammerhead and wedgefish retained catch using raw
* counts and standardized counts by the number of days in the period.

eststo: quietly xtreg hh_count $xlist , re cluster(vesseln)
estadd scalar PercentChange= 0.24
eststo: quietly xtreg hh_count_norm $xlist , re cluster(vesseln)
estadd scalar PercentChange= 0.26
eststo: quietly xtreg wf_count $xlist , re cluster(vesseln)
estadd scalar PercentChange= -0.17
eststo: quietly xtreg wf_count_norm $xlist , re cluster(vesseln)
estadd scalar PercentChange= -0.12

esttab using GLS.txt, replace eqlabels(" " " " " " " ") ci keep(treatment) stats(PercentChange N) nostar label title("Estimated Treatment Effects for Hammerheads and Wedgefish, Supplemental Table 3") mtitles("Hammerheads" "Hammerheads/Day" "Wedgefish" "Wedgefish/Day") nonotes

eststo clear


******************************************************************************
******         EXPLORATORY ANALYSES TO ELUCIDATE MECHANISMS    	      ********
******************************************************************************

*   The numbers in the comments that follow are based on the live release data 
* file Release_1.csv 

*   93% of the vessels that received live release payments were from the three 
* villages in the regency Aceh Jaya. They also accounted for 93% of the payment
* days (a payment day is a date on which the vessel received at least one 
* payment).

*  Is the period-level correlation of retained catches of hammerhead and 
* wedgefish within each vessel different in the two regencies?

corr hh_count wf_count if siten==1
corr hh_count wf_count if siten==2

*   Yes, a lot different. The correlation for Aceh Jaya is high but it is near 
* zero for East Lombok. Thus the potential for spillovers in the villages 
* where live releases were common is high.

*   If targeting leads to spillovers, then rho will be higher in T=1 condition
* than the T=0 condition.
*   ie., is the pattern above stronger in the treated condition than in the
* control condition for villages that received most of the payments, whereas 
* that difference is not observed in villages that do not receive much payment?

corr hh_count wf_count if siten==1 & treatment==1
corr hh_count wf_count if siten==1 & treatment==0

corr hh_count wf_count if siten!=1 & treatment==1
corr hh_count wf_count if siten!=1 & treatment==0

*   Yes, we see an increase in the correlation in the treatment condition
* but only in the villages getting most of the live release payments.

*   Let's look a little deeper at the distribution of live release payments.
* Based on information about the villages from which recipients were coming,
* we can explore the patterns to see if we can infer something about
* hidden action and potential cross-species spillovers. We need to recognize
* we don't have a lot of statistical power for subgroup analyses and we're
* not controlling for the family-wise Type 1 error rates. We're just
* exploring the patterns in the estimated ATE% and the correlations between
* period-level retained hammerhead and retained wedgefish within vessels.

*   Village 3 received most of the HH payment days (67/72=93%) and 341 out of
* 364 individual live releases of HH (94%).
*   Village 3 received hardly any WF payments: 1 boat on 5 days (6 fish total).
*   Receiving most of the WF payments were Village 2 (84/119 = 71% of
* payment days) and Village 4 (28/119=24%). These two villages accounted for 467 
* of the 475 individual live releases of WF (98%). 
*   Village 2 received hardly any HH payments: 2 boats, 1 day each 
* (18 fish total, and 1 boat only released 1 HH)
*   Village 4 received hardly any HH payments: 4 boats, 1 day each (5 fish 
* total).

*   Let's consider the two villages that received most of the WF payments.
*  Recall that they released very few live hammerheads (23 total).
*   Can we reject the null of zero effect on hammerheads?
*   If so, we would have evidence consistent with a spillover effect.
* i.e., Hidden action spurs vessels to catch more wedgefish to obtain payments,
* and that action also results in more hammerheads being retained.

xtpoisson hh_count treatment i.phasen#i.villagen if (villagen==2 | villagen==4), irr vce(cluster vesseln) exposure(day_count) nolog

*   Despite few live releases of hammerheads in these villages, we see an
* increase in retained catch of hammerheads, consistent with a spillover effect
* from wedgefish to hammerheads.

*   The boats in these two villages released 467 wedgefish and only had
* retained wedgefish of 116 in the treated periods, thus the conventional 
* approach to estimating the ATE% based on live releases only would imply that 
* the offer of payments reduced retained wedgefish by 467/(116+467)= 80%.

xtpoisson wf_count treatment i.phasen#i.villagen if (villagen==2 | villagen==4), irr vce(cluster vesseln) exposure(day_count) nolog

* 80% reduction is an IRR of 0.20, or a coefficient of ln(0.20)=-1.6094

test treatment = -1.6094

*   Yes, the estimated ATE% is consistent with a spillover effect.
*   Almost all of the 95% CI is in negative orthant, suggesting effort to
* reduce overall retained wedgefish catch, and we can reject the expected 
* reduction from the conventional M&E approach that only looks at live releases
* and assumes no change in behavior.

*   So these results overall suggest that at least some of the ATEs we're 
* observing arose from a mix of hidden action related to the wedgefish 
* payments and a subsequent spillover to the hammerhead catch.

*   What about hammerhead payments themselves?
*   Village 3 received most of the hammerhead payments, but we do not
* have a lot of statistical power from this one village of 13 vessels.
* But we can see if we can detect any hidden action related to hammerhead
* effort.

xtpoisson hh_count treatment i.phasen#i.villagen if (villagen==3), irr vce(cluster vesseln) exposure(day_count) nolog

* The estimated effect is an increase of 46%, but the CI is wide.

*   Recalculating the ATE% using the conventional M&E approach and just vessels
* from this village, we have ATE% = 341/(1744+341) = 16.36% reduction, which
* is the same as an IRR = 0.8365 or coefficient of ln(0.8365)=-0.1785

test treatment = -0.1785

// Effect on wedgefish (i.e., potential spillover effect)

xtpoisson wf_count treatment i.phasen#i.villagen if (villagen==3), irr vce(cluster vesseln) exposure(day_count) nolog

*   The estimated effect is an increase of 48%, but the CI is wide.
* i.e., the estimated ATE% is positive and thus consistent with a spillover,
* but the statistical evidence is weak.

*   What about the budget-relaxation mediation path? Recipients are paid every
* week, so some vessels were paid money during their first treated period
* but fishing investments take some time to finalize and they are likely to be
* cumulative, perhaps in a nonlinear fashion.
*   If budget-relaxation is an important pathway, we would expect the ATE% to 
* be larger in the third and fourth periods compared to the first and second 
* periods (i.e., the second round of treatment), although we acknowledge our 
* design does not have a lot of statistical power for estimating ATE% by 
* periods.

* Create a dummy variable for the third and fourth periods

gen  Period3or4=1 if( phasen==3 | phasen==4)
recode Period3or4 (.=0)

*   Now estimate the effect on the two taxa with an interaction of the treatment
* and the Period3or4 variable. If budget-relaxation were an important mediator,
* we should see the IRR on the treatment varible, which now is the ATE% in
* Periods 1 and 2, go down and the interaction term will imply a larger ATE% 
* in Periods 3 and 4.

xtpoisson hh_count i.treatment##i.Period3or4 i.phasen#i.villagen, irr vce(cluster vesseln) exposure(day_count) nolog

*   We see the opposite pattern from the one expected if budget relaxation was 
* an important driver. 

xtpoisson wf_count i.treatment##i.Period3or4 i.phasen#i.villagen, vce(cluster vesseln) exposure(day_count) nolog

* Warning: variance matrix is nonsymmetric or highly singular and so Stata 
* doesn't report standard errors for the IRRs. It's arising due to the sparsity
* in the village-period interactions given so many vessels are retaining zero
* catch. The punchline is that we also see the opposite pattern of the 
* estimated coefficients from the one expected if budget relaxation were an 
* important driver. We're not interested in the standard errors here.



****************************************************************************
**************            SUMMARY STATISTICS                      **********
****************************************************************************

* Describe period length (exposure)

xtsum day_count

* Observation in M&M
*         There are 261 vessel-period observations.
* Observation in M&M
*         On average, we observe vessels for three period (T-bar)
* Observation in M&M
*         The periods are, on average, 107 days long.

* For readers unfamiliar with xtsum (from Stata help text):
*    xtsum decomposes the variable yit into a between (xi) and 
* within (xit - xbari + xbar) SD, the global mean xbar (e.g., 107) being 
* added back in to make the results comparable.
*   Between SD is SD across vessels (SD^2=variance)
*   Within SD is SD over time for a given vessel (SD^2=variance)
*   Overall SD is roughly sum of between^2 + within^2

*   How many vessel observation opportunities are there? i.e., the potential 
* landing days on which retained catch could have been observed by the field
* team at the landing docks.

* First summarize the data and then display the sum of days.

summarize day_count, detail
display r(sum)

* Observation in M&M
*        27,992 vessel observation opportunities.

* Total number of vessels in the analysis

unique vesseln

* Distribution of vessels across regencies
* 1=AJ (Aceh Jaya), 2=TL (East Lombok)

unique vesseln if siten==1
unique vesseln if siten==2

* Observation in M&M
*        Numbers above are reported. 

* Distribution of vessels across villages

unique vesseln if villagen==1
unique vesseln if villagen==2
unique vesseln if villagen==3
unique vesseln if villagen==4
unique vesseln if villagen==5

* Numbers above are not reported.

* The distribution of observations across villages, regency (site), and periods.

tabulate siten
tabulate villagen
tabulate villagen siten
tabulate phasen

* Observation in M&M
*      60% of observations were from East Lombok (2=TL)



******                   HAMMERHEADS SUMMARY STATISTICS                *******

* How many hammerheads retained in entire experiment

summarize hh_count
display r(sum)

* Summarize Hammerhead retained catch normalized by period length

*   Calculate average number of retained hammerheads per day per boat.
*   In other words, across all boats the average of the average 
* numbers of retained hammerheads per day.

*   If I selected a boat at random and a day at random, how many retained 
* hammerheads would I expect to observe caught and retained on that day?
 
xtsum hh_count_norm

*    The overall and within are calculated over 261 vessel-periods of data. 
*  The between values are calculated over 86 vessels, 
*  and the average number of periods a vessel was observed is 3.035 
*  xtsum also reports minimums and maximums. The average number of hammerheads 
*  retained per day in a period varied between 0 and 8.58. 
*  Average number of hammerheads retained per day for each vessel varied 
*  between 0 and 5.57. 
*    HH retained within varied between -2.57 to 3.49, which doesn't mean that
*  any vessel retained negative fish. The within number refers to the
*  deviation from each vessel’s average and some of those deviations must be
*  negative. That means that some vessel deviated by quite a lot from its
*  average. In the definition of within variation, we add back in the 
*  global average, for HH_count_norm, of 0.48, so some vessel deviated
*  from its average by 3.01 fish per day (3.49-0.48).
*    The standard deviations say that the variation in fish caught in a period 
*  across vessels is almost twice the SD observed within a vessel over time 
*  That is, for two randomly chosen vessels from the sample, 
*  the difference in retained HH is expected to be nearly twice 
*  the difference for one vessel in two randomly selected years.
*   That pattern implies that having the within-vessel data helps increase
*  statistical power in the HH estimation.

* How many observations are zeroes out of the total?

count if hh_count_norm~=.
count if hh_count_norm==0

*   Zero is most frequent but there are not a lot of zeroes in comparison to 
* all the other values

* In another file, HH_Count_TotalsandReleases.log, we can see that only 5
* boats out of 86 never had a recorded retained hammerhead during the 
* experiment

*   Compute autocorrelations for hh_count, which gives us a sense of strength 
* of serial correlation. Lagged one period.
* Standardize first by day_count to make them comparable across periods.

corr hh_count_norm L.hh_count_norm


******                      WEDGEFISH SUMMARY STATISTICS                *******

* How many wedgefish retained in entire experiment

summarize wf_count
display r(sum)

* Summarize wedgefish retained catch normalized by period length

*   Calculate average number of retained wedgefish per day per boat.
*   In other words, across all boats the average of the average 
* numbers of retained wedgefish per day.

*   If I selected a boat at random and a day at random, how many retained 
* wedgefish would I expect to observe caught and retained on that day?
 
xtsum wf_count_norm

*  To understand this output, use the text above for -xtsum hh_count_norm- .

* How many observations are zeroes out of the total?

count if wf_count_norm~=.
count if wf_count_norm==0

*   Zero is most frequent and there are a lot of zeroes.

* In another file, WF_Count_TotalsandReleases.log, we can see that 31 boats
* out of 86 never had a recorded retained wedgefish during the experiment.

*   Compute autocorrelations for hh_count, which gives us a sense of strength 
* of serial correlation. Lagged one period.
* Standardize first by day_count to make them comparable across periods.

corr wf_count_norm L.wf_count_norm


******   Summarize the Hammerhead retained catch by treatment condition *******

xtsum hh_count_norm if treatment==1
xtsum hh_count_norm if treatment==0

* How many retained in treatment period

summarize hh_count if treatment==1

display r(sum)

* How many retained in the control period
summarize hh_count if treatment==0

display r(sum)

* For the treated condition, there were 7,830 hammerheads retained over 
* 14,269 landing days or 0.55 hammerheads/day.

* For the control condition, there were 5,717 hammerheads retained over 13,723
* landing days or 0.417 hammerheads/day.

******   Summarize the Wedgefish retained catch by treatment condition *******

xtsum wf_count_norm if treatment==1
xtsum wf_count_norm if treatment==0

* How many retained in treatment period

summarize wf_count if treatment==1

display r(sum)

* How many retained in the control period
summarize wf_count if treatment==0

display r(sum)

* For the treated condition, there were 198 wedgefish retained over 
* 14,269 landing days or 0.0139 wedgefish/day.

* For the control condition, there were 231 wedgefish retained over 13,723
* landing days or 0.0168 wedgefish/day


//////////////////////////////////////////////////////////////////////////////
//* Why gamma distribution for random effects in Poisson estimator rather than
//* normal distribution? Because normal distribution underpredicts the observed
//* counts
//////////////////////////////////////////////////////////////////////////////

* Hammerheads with gamma distribution of random effects
quietly xtpoisson hh_count treatment i.phasen#i.villagen, irr vce(cluster vesseln) exposure(day_count)
predict double fitted, xb
gen double yhat=exp(fitted)
egen meany=mean(hh_count), by(vesseln)
egen meanyhat=mean(yhat), by(vesseln)
gen double exp_alpha = meany/meanyhat if meanyhat>0
replace yhat=yhat*exp_alpha if meanyhat>0
su hh_count yhat
ge d = yhat - hh_count
scatter d hh_count
drop fitted d yhat exp_alpha meanyhat meany

* Hammerheads with normal distribution of random effects
quietly xtpoisson hh_count treatment  i.phasen#i.villagen, irr vce(cluster vesseln) normal exposure(day_count)
predict double fitted, xb
gen double yhat=exp(fitted)
egen meany=mean(hh_count), by(vesseln)
egen meanyhat=mean(yhat), by(vesseln)
gen double exp_alpha = meany/meanyhat if meanyhat>0
replace yhat=yhat*exp_alpha if meanyhat>0
su hh_count yhat
ge d = yhat - hh_count
scatter d hh_count
drop fitted d yhat exp_alpha meanyhat meany

* Wedgefish with gamma distribution of random effects
quietly xtpoisson wf_count treatment  i.phasen#i.villagen, irr vce(cluster vesseln) exposure(day_count)
predict double fitted, xb
gen double yhat=exp(fitted)
egen meany=mean(wf_count), by(vesseln)
egen meanyhat=mean(yhat), by(vesseln)
gen double exp_alpha = meany/meanyhat if meanyhat>0
replace yhat=yhat*exp_alpha if meanyhat>0
su wf_count yhat
ge d = yhat - wf_count
scatter d wf_count
drop fitted d yhat exp_alpha meanyhat meany


* Wedgefish with normal distribution of random effects
quietly xtpoisson wf_count treatment  i.phasen#i.villagen, irr vce(cluster vesseln) normal exposure(day_count)
predict double fitted, xb
gen double yhat=exp(fitted)
egen meany=mean(wf_count), by(vesseln)
egen meanyhat=mean(yhat), by(vesseln)
gen double exp_alpha = meany/meanyhat if meanyhat>0
replace yhat=yhat*exp_alpha if meanyhat>0
su wf_count yhat
ge d = yhat - wf_count
scatter d wf_count
drop fitted d yhat exp_alpha meanyhat meany



****************************************************************************
****               OTHER MISCELLANEOUS NOTES                           *****
****************************************************************************

* NOTE 7: Why not use a negative binomial model? Cameron & Trivedi, Wooldridge, 
* and others argue for Poisson over Neg Bin because no one has shown that
* neg binomial produces robust estimates of the mean effect, whereas it has
* been shown for PQMLE. Neg binomial may be preferred if we wanted to
* compute probabilities (prediction), but we're interested in estimating 
* mean effects of the intervention not predicting retained catch.

* NOTE 8:  Why don't we explore heterogeneity across villages or periods?
* The main reason is because of insufficient statistical power, but also because
* it's not clear what insights such an analysis would provide for science or 
* practice. In terms of insights for practice, the policymaker's goal is 
* to reduce mortality of endangered species and thus it does not matter whether 
* the reduced mortality is produced by all vessels reducing mortality or by a 
* small fraction of vessels responsible for much of the mortality. Understanding
* heterogeneity may be useful for better targeting of future programs, which can
* improve cost effectiveness by purchasing fewer cameras and setting up fewer 
* payment accounts, but for the learning about the current program, the ATE is 
* the most relevant estimand. 

* NOTE 9: If the reader wants to see ATEs in levels rather than percentages, we 
* recommend that they calculate the Average Marginal Effect. 
* Why? We're most interested in the difference between the treated and control 
* distributions of retained catch, i.e., the marginal effect of flipping 
* treatment from 0 to 1. How much of a change in the predicted retained catch
* do we see when we go from no offer to an offer of live release payments?
* i.e., the ATE is the average marginal effect (AME) of changing all vessels 
* from the control condition to the treatment condition. 
* E[exp(x'B)|T=1]- E[exp(x'B)|T=0], where exp is the exponential function.

*   Intuitively, the AME for treatment is computed as follows:
* Go to the first vessel case. Treat that vessel as though it were untreated, 
* regardless of what the vessel's condition actually is. 
*   Leave the period & village variable values (and vessel REs) as they are in 
* the data. Compute the predicted retained fish if this vessel were untreated 
*   Now do the same thing, this time treating the vessel as if it were treated
* The difference in the two predicted retained fish is the marginal effect for 
* that case. Repeat the process for every vessel-period in the sample.
*   Compute the average of all the marginal effects.
*   That average value is the AME for the treatment variable.

*   We are comparing two hypothetical populations: one all treated and one all 
* control but with the exact same phase-village values that we have in the
* sample. Usually AMEs are only controversial when using a variable that 
* really couldn't take on both values for each unit (like sex or race).

* Some sample code for doing those calculations are below

*   The syntax "i." tells stata we're using factor variables. It's not needed
* for treatment variable, which is only 1 or 0, but the margins command for
* calculating the AME requires it. We estimate the AMEs using the margins syntax
* that requests average marginal effects using exponentially transformed 
* coefficients.
* -margins, dydx(treatment) expression(exp(predict(xb)))- 

* Example code
* quietly xtpoisson hh_count i.treatment i.phasen#i.villagen, 
*                                     vce(cluster vesseln) exposure(day_count)
* margins, dydx(treatment) expression(exp(predict(xb)))

*   If you want to any statistical test on the AME, add -post- after comma.
* The "post" option is needed to do a post-estimation hypothesis test on the
* caculated AME value. e.g., something like -test _b[1.treatment] = 5-

*   To get a comparable value from the conventional approach using only live
* releases, we recommend that you calculate, for each vessel, how many
* live releases each vessel had and the duration of their exposure to the
* treatment condition. There are 29 vessels that received payments, so
* each has a positive value for the number of releases per day of exposure
*   The remaining 86-29=57 vessels that didn't receive any payments get a
* "0" for the number of releases per day of exposure (We're assuming that 
* we're dropping the vessel that has no landing observations and so use 86
* rather than 87 total vessels).

*  Then we take the average live releases per day across all vessels in the
* experiment. Using the same assumptions as used in the "Conventional
* Monitoring & Evaluation" approach, this average is also an estimated
* effect of offering live release payments.
*  To make these values comparable to counts, we have to inflate them by the 
* number of days that the average vessel was exposed to the treatment period.

*   To further illustrate what the AME is, one can calculate the Average Adjusted
* Predictions. We would have to run the estimation again.

* quietly xtpoisson hh_count i.treatment i.phasen#i.villagen, 
*                                      vce(cluster vesseln)  exposure(day_count)
* margins treatment, expression(exp(predict(xb)))

log close
